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ABSTRACT 


Dynamic programming is employed to obtain a solution to the problem 
of controlling a nonlinear system in an optimal fashion, subject to a 
quadratic performance index. The technique used is similar to that given 
by Merriam and Kalman for linear systems. 


For some special nonlinear systems, the solution can be computed 
by direct application of this technique. As an example, the optimal 
control system for a freely spinning body is determined. 


For more general nonlinear systems, the solution cannot be obtained 
directly. However, it is possible to obtain a solution indirectly. This is 
done by first linearizing the vector-state equations representing the 
nonlinear system. Next, dynamic programming is used to obtain an 
approximate solution based on the linearized state equations. Then an 
iterative procedure for improving the solution is presented. It can be 
shown that if the iterative procedure converges, it converges to the 
exact solution of the optimal nonlinear control problem. 


Computer example problems are given to illustrate the method, and 
to indicate the convergence that 1s usually achieved. In addition, the 
performance of the optimal control system is compared with the perform- 


ance of a simple sub-optimal control system for some of the example 
problems given. 
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CHAPTER I 
SUMMARY 


1.1 Introduction 


During the past decade, a new approach to automatic control has been 
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developed principally as the result of work by Bellman "3 and Kalman“ 
in this country, and Pontryagin' in the U. S. S. R. This approach, which 
is now commonly called the ''theory of optimal control systems,'' differs 


from the now classical approach to automatic control of Newton, Gould, 


and Kaiser,? 


for instance, in that it uses a vector differential equation 
description of the system instead of a transfer function description, and 
it concentrates on the time domain methods of analysis and synthesis, 
instead of frequency domain methods. The theory of optimal control has 
made use of the calculus of variation,” and the new but related 


1212 5 
as well as the '' maximum 


''dynamic programming'' of Bellman, 
principle'' of Pontryagin. 

Useful results of the application of these methods to optimal control 
problems have been obtained primarily for linear systems. Useful results 
have been obtained for nonlinear systems in only a few very special cases.!”** 
It is the objective of this work to extend to nonlinear systems some 


techniques that have been successful in the design of controls for linear 


systems. 


1.2 Notation and Terminology 


An attempt has been made to keep the notation and terminology con- 
sistent with current literature. In particular, the notation used by Kalman 
has been used whenever practicable. 

Vectors are designated by underlined lower case letters. All vectors 


are understood to be column vectors. Jor example, the vector x denotes 


x=| ` (18:13 





Similarly, matrices are designated by underlined upper case letters. 


For example, the matrix A denotes 


ед „Фо: NE 
а ааа" ME 

А - . е е 

= (1. 2) 
Rai 8.27 . С. 


The transpose of a vector or a matrix is designated Бу a prime. Thus 


1.2 ГА (1.3) 
апа 
“а Сі “Т 
ае go we 
& (1.4) 
" A n Em 


The inner product of two vectors is denoted by x'y, andis given by 


n 


ХЭЭ (1.5) 
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Consistent with this, the square of the Euclidian norm, denoted by 
|| «|! is given by 
LA 
Из = xx (1. 6) 
Thé quadratic form of a vector with respect to a symmetric matrix 
A, is given by x'Ax. For convenience, it is often indicated by 


ПЕЙ, «алж (1.7) 


The derivative of a vector or a matrix with respect to the scaler 


variable, time, is indicated by the notation, 





dx, /dt 
dx, /dt 


x = dx/ dt = | (18 8) 


dx /dt 


and 
da, , /dt da ,/ мий da, /dt 


da, , /dt da, /dt.. . da, /dt 


А = дА /а = | | (1.9) 
da ,/dt da ,/dt.. . da /& 


The gradient of a scalar function of x is denoted by 


дУ (х)/дх , 
дУ (х)/дх, 
V , (x) = grad У (х) = | (1.10) 


[9У(®/9х, 


Similarly, the Jacobian matrix of a vector function of x is denoted by 


Of (x)/ox, дҒ (хУ/9х,. . . 0f (жу/дх | 
df (x)/0x , Of, (Box, . A ЖЕТІ 


гбх) = | | | (1. 10а) 


df (x)/dx, df (x)/dx,... Of (x)/dx, 








1.3 Problem Statement 


Consider the system described by the vector differential equations 


k(O =f(x(O,u(0,0; х(0)-с (1.11) 
y(t) =h(x(t), t) (1. 12) 


where x(t) is the system state vector and y(t) is the system output 
vector. For this system, it is desired to find the control vector, u(t), 
such that a performance index, J(t), is a minimum. In particular, we 


will assume that J (t) has the quadratic form, 


R (7) 


T 
1 1 
jo f em rll Ill dT (1. 13) 
| Q 


where z(r) is the system desired output, апа О(т) апа К(т) are 
positive definite matrices weighting the system error and control effort, 
respectively. We will require that the control, u(t) be expressed as 
u (x (t), z (t)) so that it can be realized in a feedback configuration. 

It is mathematically convenient to consider first the discrete time 
version of the same problem for the theoretical development. Actually, 
the discrete time version is a meaningful problem in its own right. It 
is this version that applies when a digital computer is used to synthesize 
the controller. 


For the discrete time problem, the equations 
x(k +1) = f(z(k), и (К), К); х(0) «с (1. 14) 
y(k) =h (x(k), k) (1.15) 
replace equations (1. 11) and (1. 12), and 


N ل‎ 
1 ) "V ] ЕТ. 
J(k) = 2. ; a - y 6l * 2. Hy GTI (1.16) 


Q (j) R (J) 
J =k 


replaces equation (1. 13). 





1.4 Solution of the Discrete Time Problem 


The solution of the discrete time nonlinear optimal control problem 
is sketched here. For a detailed solution, see Chapter II. 
In order to proceed by dynamic programming, we define the value 


function 


Min 
ум (00) = 


E)... uen) 1%! (1. 17) 


Then by the ''principle of optimality, '' it follows that 


Min |] | à , |2 1 : ” 
и (к) |2 nn TELA nr EHI (1.18) 


V yoy (2 (k)) = 
An approximate solution to this equation can be obtained by assuming 


x(k«1) Y f(x (k), u (k), k) + f (х (к), и" (к), К) (00) —x (k)] +f (x° (k),u (k),k)[u(k) -u (k)]) (1, 19) 


y (k) 7 h(x (k), k) +h (x (k), k) [x(k) = x" (kJ (1. 20) 


and 


] : ^ 
e (LY) = laco, c + x (k) (к) + a (k) (1.21) 


where EB (k), х (К), and a(k) are a parametric matrix, vector, and 
scalar to be determincd, and where х (К) апа “ (k) are as yet 
unspecified points about which we linearize. 

The approximate solution obtained by combining equations (1. 18), 


(1.19), (1.20), and (1. 21) is given by the equations 


u(k) = —[R(k) + £/ рк) Г" ('[P(k+1)f x(k) + P(k+1) b(k) + x(k+1) | (1222) 
P(k)=h Q(k)h, +f М(К) Р(К+1) f (23) 
x(k) = f° M(k) [ P(k+1) bk) + x(k +1) ) - В/О (к) Lz(k) - c (k) | (1.24) 


а(К) =2(k+1) +2 209 -с(Ю += 1] bo 1" * b'(k) x(kel) 
2 ок) 2 P (k +1) 
(1. 25) 


2 

» -1 , 
( (Rt «(P(keDt | f 
ue ur u ч 


- = [|B Ges) b(k) + x(k +1) || 





where 


M(k) = 1-P(kil) f [R(k) +f“ P(k+1) f | f, (12226) 
b(k) ef- f, x (k) — f. u (k) (1,228) 

апа 
c(k)=h-h,x (k) (1. 28) 


In the above equations the arguments for +, Г : Е, h, and jor) have 
been omitted for simplicity. They are nd to be uds at 
3 (k), u (k) and k, as appropriate. 

The boundary conditions for equations (1. 23), (1. 24), and (1. 25) can 


be obtained from equations (1.16) and (1. 21). They are 


P (N+1) -0 (1. 29) 
х(№+1) =0 (1.30) 
a (N+1) = 0 (127247 


Notice that equations (1. 23), (1. 24), and (1. 25) must be solved 
backward in time, starting at time, N+l, where the boundary conditions 
are known, and working backward to the present time k. This implies 
that the desired output, z (k), must be known in advance so that the 
parameters P and x can be pre-computed. Once these parameters are 
known, tlre control system can be synthesized. Figure 1,1 shows а 
block diagram of the control system for the discrete time nonlinear 
optimal control problem. 

Prom figurë 1.1 it can bé зёёп that thë controller for the system 
consists of a time varying linear feedback portion, anda director 
portion. The feedback portion o! the controller will insure that the 
system will be relatively insensitive to state or parameter perturbations 
occurring in the system being controlled. 

The question of stability, which is of paramount importance in any 
control systern, can be answered by the use ot the second method ot 


Lyapunov. By using this method, it can be shown that the control sySv@ms 
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(1) 


(1 (2) х) Ч 


(4) x 


ду 139 
LINN 


wais{s |опио2 гезшуиом - Г`Т эт 4 


ЕШТЕ 


(L*3) d 





designed from the theory presented here are always stable. A more 
detailed discussion of stability is contained in Appendix B. 

The theory outlined above provides an approximately optimal 
solution, only. How near optimal the solution is depends on how near 
the vectors x (k) and у" (k), which must be given beforehand, are to 
the actual state and control vectors, x(k) and u(k). An exact solution 
to the nonlinear control problem can be obtained by solving the equations 
for the approximately optimal solution in an iterative fashion. 

At each itemation, the» x (lẹ) and u (k) determined on the previous 
iteration are used for the x (k) and uN (К). If convergence is Wchieved 
by this procedure, the solution obtained 1s the exact solution to the 
nonlinear control problem. The question of under what conditions the 
iterative procedure converges is still unanswered, but experience 
using this algorithm on a digital computer indicates that convergence 
occurs for a broad range of problems, and that convergence is usually 
achieved in thre or tour iterations. 

The theory presented in this section can be extended to systems with 
stochastic disturbances by minor modifications. However, the iterative 
algorithm does not produce an exact solution in this case. Details for 
the problem when stochastic disturbances are present are given in 


section 2.7. 
1.5 Solution of the Continuous Time Problem 


The equations specifying the solution to the continuous time problem 
may be obtained by dynamic programming ina manner analogous to that 


used for the discrete time problem. These equations are 


u(t) = -R (DF (P(t) x(t) + x(t) (1.20) 
P (t) = P(D f, R^ (t) Е ВОО) паа ОР С (17399 


- x 


х(1) =h QW lz -с(0014 Р (5, 871(012х(0-Р(05(9-8К, x(t) (127559 








ТЕ 2 - 
* 7 Waco fl, вон HORM (1. 35) 


u = 


2 


. 1 
ЩО > y ato) его ы 


with the boundary conditions 


P(T) =0 (1. 36) 
х(Т) =0 (120) 
a (T) = 0 (1.56) 


Chapter III contains a full development of the theory for the continuous 
time problem. The question of system stability is discussed with reference 


to the continuous time problem in Appendix B. 


1.6 An Analytic Example 


Consider the equations of motion of a freely spinning body about 


three mutually perpendicular axes, 


гаф patu; X. (O) | (1. 39) 

i AE x wu, х,(0) «с, (1. 40) 

x, = a,x x, tu, x,(0) c, (1.41) 

Мете х,, х,, аша х. ашейинде angular velocities, where шт апа 
u, are controls proportional to torques, and where 

а +а, +а, #0 (1.42) 


These equations are nonlinear and coupled. 
We wish to determine хү Xy and x, such that the performance 


index 


T 
J =i Т оі са хх 4 T eo [ut eut eufll dt (1.3) 


© 


1S a minimum. 
The solution to this problem can be obtained exactly and analytically, 
it turns out, if we proc&€d in the same mannér as that indicated in the 


previous section. The solution 1s 








u (O = -К(Ох, (9 (сс) 


u(t) = -k(Ox,(t) (1.45) 
u,( = -К(Ох,(9 (1.46) 
where 
k(0 =p(09/r(0) (1.47) 
and where 
p(t) =p7(t)/r(t) —-q(0; р(1)-0 (1.48) 


for r(t) and q(t) constant, that is 


г(г) =r (1.49) 


q(t) =q (1.50) 


the solution of equation (1.48) is 


фо ТВО 
СТ) ERT Jra | i (1. 
pute 
where 
а = V q/r (1. 52) 
and 
TaT -t ДЕ 53) 


A block diagram of this control system is shown in figure 1. 2. 


SPINNING BODY 





ает. о 
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A detailed derivation of the control equations for this system, as well as 
a comparison of this control system with a sub-optimal one that uses 


constant gain linear feedback, is contained in Chapter IV. 
1.7 Computer Examples 


Consider the system described by the nonlinear equations 


x(k+l) =x(k) -0.05 x3(k) +0.05 u(k);  x(1) 21.0 (1.54) 
у (К) = x(k) (1755) 
We wish to determine u(l1),..., u(99) such that the performance index 
100 l : 99 1 
J = > - Q [z (k) 001% + Mat (1.56) 


is a minimum. 

The equations that form the basis for the iterative solution to this 
problem are given by equations (1.19), (1.22), (1:723), amd (lo эй 
this problem, all the variables appearing in these equations should be 
interpreted as scalars. Figure 1.5 shows the results of the computer 
solution oí this problem for the case when R = 0.01, Q = 10.0, and 
mek) =0 for k < 50, but (К) = 1.0 for k > 50. THE iteration 
procedure convérgéd (based on a convergence criterion of a | percent 
change in the performance index) in three iterations. The performance 
index on the third itération was 12. 272. 

A sub-optimal controller, with the control determined by 

u(k) =G [z(k) — x(k)) (150) 
where G was equal to a constant gain of 15.0, when operated with 
the same nonlinear system gave a performance index of 13. 845. 


As a second example consider the system described by the equations 


x (k+l) =x (К) + 0.01 х (К); x (1) = 0.0 (1.58 
x (k+1) = x,(k) — 0.02 x(k) - 0.03 [x ,(k)! x,(k) «0.05u(k);  x,(1) 2 3.0 (1. 59) 
y (k) =x (k) (1. 60) 


y (k) = x,(k) (1.61) 





ве so} 








o9t- 
eg- 
-жор 
O'S 
qn 
| я 
Ша офи ШИНЖ... ce | 224 - P 
= cM | 2 
| ис ее. | | | (Ұ/х f 
— 1 ии, п в — — - p E. оно - 
| . i і | i 
| | 1 "E pl š 
' г | w 4 2 PEE. X зо | 
E | 


IS" TOISD. 1010? y 


(4)^ S0*0* QD сх 60°0 – (0) х = (1+) х 


і | 1 


ша 





Again, we wish to control this system such that the performance index 


10 0 99 


] 221 2 1 
1 Y 420, [2,0 2,001 «5 0, [2,00 =x,001% + 9725440 (1.62) 


k=l к=] 
is a minimum. 

The two-dimensional version of equations (1. 19), (1.22), (1. 23), 
апа (1, 24) form the basis for the iterative solution procedure. 
Figure 1.4 shows the results of the computer solution of this problem 
for the case when R = 0.01, Q, = 1.0, Q, = 1.0, z, (k) =O and 
2, (К) = 0. Convergence was achieved in four iterations, and the 
performance index on the fourth iteration was 29. 29. 


The sub-optimal controlled with u(k) determined by 
(К) = С, (2, (К) -х, (к) | +С,[2, (к) -х, (К) | (1. 63) 


with G, = 8.50 and G, = 4.75, when operated with the same nonlinear 
system gave a performance index of 31.32. Chapter V contains the 


results of several additional computer examples. 
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СНАРТЕВ TI 
PYSGRETE TIME SYSTEMs 
2.1 Introduction 


The theory for the control of discrete time systems can be developed 
more simply than that of continuous time systems. In particular, the 
discrete time theory avoids some questions about the existence of limits, 
etc. For this reason, the discrete time theory is presented first. 

This chapter first considers linear discrete time systems thoroughly. 
Then, using the linear results as a guide, the theory is extended to include 
nonlinear systems. The exact solution of the nonlinear problem is 
presented in the form of an iterative algorithm. The final section of the 


chapter considers the problem when stochastic disturbances are present. 


2.2 Linear Systems 


The theory for the optimal control of deterministic linear systems 


7 15,16 


has been worked out by Kalman,’ Merriam, and others.!^!? For 


this case, the system considered can be described by the equations 


х(К+1) = F(k)x(k) + G(k)u(k); х (0) «с (2.1) 
У (к) = Н (К) х(К) (2. 2) 


where x(k) is the n-dimensional system state vector, u(k) is the 
r-dimensional system control vector, and y(k) is the m-dimensional 
system output vector. 

The performance index is 


N N- 1 


1 | МФ | 2 
үк) = $ 2 1409-3001 y, + 2, ИН ер (2. 3) 


ий ] К 


where z(j) is the desired output vector. 


15 








To find the optimal control sequence, u(k), u(kt+l),..., u (N- 1), 
the method of dynamic programming is used. For this purpose, we 


define the value function, 


Min 
Тр = 1) 091 (22) 
у (К), Ee cy u(N-1) 


We then invoke the ''principle of optimality,'' which states: 
'lan optimal policy has the property that, whatever the 
initial state and the initial decision are, the remaining 
decisions must constitute an optimal policy with regard 
to the state resulting from the first decision.''!” 

Thus, it follows that 


Min 
I 
Vn., (X(k)) = > | 5 || z(k) У Пон, + - || u(k) um. + ии хон | (2.5) 


A Solution іог У,, (х(К)) апа u(k), (k=0, 1,...,N-1), can be 


obtained by assuming 


1 2 ; - 
Ук (Ж(К)) = 5 || x(k) us + x (k) x(k) + a(k) (2:0) 


where P (k), x(k), and a(k) are a parameter matrix, vector, and 
scalar, respectively, to be determined. By combining equations (2. 5) 


and (2.6), we get 


2 


: “сай Мїп Т 2 1 
EIA E ol у, + را‎ 


t 
2 
(257) 


] 2 ; $ \ 
= ИИТ) AM +х (К+1) х(К+1) + зн, 


The vector variable x(k*l) can be eliminated from this equation by 
using equation (2. 1). This gives 


Min 


1 2 ; | Рі E 2 | 
Е ы, (k) x(k) + a(k) к 12 21 A 


(2.8) 
; - I| F(k)x(k) + G(k) u(k) le aay che) atte) + Ge) a) | мо 


The minimizing value of u(k) for the expression on the right-hand 


side of equation (2.8) can be determined by ordinary methods of calculus. 


| б 





This value is 


u, (k) = [BR (9 + G’(k) P (k+1) G(k) |” G “(k) [ P (k+1) F (k) z (k) + x(k+1) ] (2. 9) 


By substituting the expression for the minimizing value of u(k) into 
equation (2. 8), we get 


| Г = ы 
SUM. + O +0 00 ШР 


] ы 2 
— — Р г | m 2% 10 
; || P (k+1) Е(К)х(К) + х(К+1) — ma’ E EE N ( ) 


+= || бө? +х^(К)Е ^(К) х(К+1) + a(k+1) 
2 Р (к + 1) 


This equation will be satisfied for all x(k) if and only if the following 


recursion equations are satisfied. 


P(k) =H “(k)Q(k)H (k) + F ^ (K) M(k)P (k«1) E (k) (220080 
x(k) = F (k)M(k)x(ke1) -Н ^ (K) Q(k) z(k) (2. 12) 
олан -5 14091 S Tl UU д (2. 13) 
2W Wi > = ск) | вк) +6 00 P (+1) (0) С (ко 
where 
М(к) =1 — P (k+1) G(k) [R (k) + G “(k) P (k+1) G(k)] ! G (k) (2. 14) 


The boundary conditions for this set oí equations can be determined from 


equations (2. 3), (2.4), and (2.6) evaluated at k = N. Thus 


P (N+1) = 0 (229857 
х(М41) -0 (2: 16) 
а (№+1) =0 EAM 


form the appropriate boundary conditions. 

Notice that equations (2. 11), (2. 12), and (2. 135) must be solved 
backwards in time. For this reason, the system must be ''deterministic'' 
іп the sense that z (j) must be known on the entire interval, 

{= К, Wiel, ..., М, in ordër to computë thë optimal control wector at 
time j = k. Also notice that a(k) is required to determine V, , (x (k)), 


but is not réequired"to detérming u(k). Thus, if we want to synthésize 
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the optimal control system and are not interested in computing the 
minimum value oí the performance index, we need not compute the a(k) 
sequence. A block diagram of the optimal linear control system is shown 
in figure 2. 1. 

As can be seen from the block diagram, the controller consists of a 
time varying linear feedback portion and a feed-forward or director 
portion. The feedback signal is simply the system state vector amplified 
by the time varying gain matrix P(k+1)F(k). The feed-forward signal, 
х (К), may be interpreted as a modified desired output. In other words, 
the closed loop portion of the system tries to follow x (k) instead of 
Z(k) because it is more economical. 

From equation (2. 12), it can be secn that x (k) is derived from 2 (К) 
by the feedback system shown in figure 2.2. As has been stated previously, 
this system operates backward in time. 


z (k) x(k) x (k+1) 


UNIT 
H'(k)Q(k) “ М СЕ 


+ 


F’(k)M(k) 


Eme 2,2 - Зет № x (k) 


If the output of the system shown above follows the input reasonably 
well, using ы) Q (k) z (k) in place of x (k) for the feed-forward input 
to the control system oí figure 2. 1 should give nearly optimal performance. 
This would eliminate the objectionable requirement of having to know z (j) 
over the entire interval in advance. 

The computational procedure for determining the optimal control is 
evident from the nature of the equations. The matrices P(N), P(N-1),.. 
P (0), and the véctors x (N), x (N- 1), "T x (0) must be pre-computed 


by backwards recursion of equations (2. 11) ала (2. 12). Thtse quantities 


. ) 





would then be used along with x(k) to determine u(k) as the actual 
control system evolves forward in time. 

Another consideration concerning the optimal control system is that 
of the measurement of state variables. For the preceeding development, 
we have tacitly assumed that the state variables are exactly measurable. 
This frequently is not a reasonable assumption. For the linear problem, 
Gunckel'"?' has shown that the optimal control system for the case when 
the state variables are not exactly measurable consists of the control 
system derived above with an optimum filter inserted in the control loop 
to estimate the state variables. When the state variables are not exactly 
measurable in the case of nonlinear control systems, we have no 
assurance that an optimal filter to estimate the state variables inserted 
in the control system will result in optimal performance. In this case, 
however, as Cox ? has pointed out, if the state variables are not exactly 
measurable, we have no alternative to determining the optimal control 
system by assuming the state variables are exactly measurable and then 
inserting an optimal filter in the control loop. In all that follows, we will 
assume that the state variables are exactly measurable. Cox^^ has 
treated the problem of estimating state variables in noisy nonlinear 


systems. 


2.3 Nonlinear Systems 


The theory for the optimal control of deterministic linear systems 
is extended to a fairly general class of nonlinear systems in this section. 
Actually the solution derived in this section is only approximately 
optimal. Section 2.4 presents an iterative procedure based on this 
approximate solution that leads to the exact solution. 

For the nonlinear case, the system considered can be described by 
the statc equations 


x(k+1) -f(x(k), u(k), k; х(0) =с (2. 18) 


y (k+1) = h (x (k), k) (2. 19) 





° 
. 


The performance index is 


N N- 1 
] | аа 1 112 
J (k) = 5 120) ЙО р + у З Наа (2. 20) 
j=k ще J=k 


We follow the procedure of the previous section and define 


Min 
Ум (g(k)) = LJ (k) | (2.21) 
u (k), .... u (N-1) 


By the principle of optimality, it follows that 


V k Min 1 L 2 1 E 
мж 1518 ) -LO +5 Е + Ул. (бен (2.22) 


We cannot solve this equation by direct methods; so we resort to 
linearization. 


The approximations are 


r(k+1) 2f(z (k), u (k), k) + f (x° (k), u (k), k) [e (k) — x`(k)] + f (x (k),u (k),k)[u(k) — u° (&)] 


and 


y (1) 2 h(x* (k), k) « h, (x (К), К) (К) - x^ ()] (2204) 


As before, we assume 


| 2 , ^ 
Vane ECE) = Пау 0) аю) (2. 25) 


my combining €quAtions (2.22), (2.23), (2. 24), and (2. 25) we obtamimine 


single equation 


l 2 E Min I : 
= |l x(k) l| + x (k)x(k) + a(k) = С ек) -y N 
2 P (k) Ый) |; у ою 
| А 1 N m ” л » 2 2, 26 
= IF го) m (IA Le) = кб, Та (Ю) за ОРОН | ) 


+ | + í (x(k)— x°(k)) + í (u(k) — u°(k)) | х(к41) + мені 


(When the arguments of f, Г, and f are omitted, they are understood 


to be evaluated at x° (k), u* (k), and k. Similarly, when the arguments 


ої h and h, are omitted, they are understood to be evaluated at x” (k) 


and k.) 


E 


(2460181) 





The minimizing value of u(k) can be computed by the ordinary 


methods of calculus, and is given by 
By in(k) = -[R(k) +f P (k+1) f] f [P (k+D) f x(k) + P (k+l) (E) + x(k+)] (2. 27) 


where 


b(k) «f - f, x* (k) — f ° (B) (2. 28) 


When the minimum value of u(k) from equation (2. 27) is substituted into 
equation (2. 26), it becomes 


2 


] 2 pw ] | 
Ela tx ME دا دع اا >“ اء‎ - (o m 


2 


] - 
5 || P (k+1)f x(k) +P (k+l) b(k) + x(k+1) || t [Ro + Pee] 


1# (242) 


{ 
u 


+ 4M ga ева 2 — «Co + ayl ана. аад) 
7 "3 P (k +1) x 


where 


c(k) eh -h x*(k) (2. 30) 


This equation will be satisfied for all x(k) if and only if the following 
set of equations are satistied: 


P(k) -h/Q(k)h, « f2M(k) P (ke) f, (22201 
x(k) = f M(k) [P (k+1)b(k) + x(k+1)] — h ° Q(k) [ z(k) — e (k)| (2. 32) 


ТИНТЕ ЕТТИ 

2 о) 2 P (k +1) 
| (2.33) 
MUT Өрөөсөн l] e 


E - || P (k+1)b(k) + x(k+1) | 
where 


M(k) -L- P(k«D f£, RO) * f^ P(cel) f] £7 (2. 34) 


The boundary values for this set of equations can be obtained in the 


same manner as for the linear problem. The boundary values are 


Р (№+1) =0 (2. 35) 
х(М41) -0 (2226) 
а(М+1) #0 (2.37) 


22. 
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Once the sequence of points, 54 (к) and u 19), av@@oiv@n, the 
sequences P(k), х (К), and a(k) may be computed by backward 
recursion of equations (2.31), (2.32), and (2.33). After these quantities 
have been pre-computed, the system may be operated forward in time 
under the approximately optimal control given by equation (2. 27). The 
pusblem 15, of course, to determine a sequence, x (k) and у? (К), 
about which to linearize such that the approximation is a good one. This 
15 {Һе subject of the next section. 

Figure 2.3 shows a block diagram of the nonlinear control system. 
Notice that although the system being controlled is nonlinear, the 


controller is time varying linear. 


АА Solutien by Iteration 


The development of the theory in this section requires us to attack 
the optimal nonlinear control problem from a different point of view, 
Consider again the system 


x(k+1) е f(x(k),u(k), k);  х(0) =с (2.38) 
y (k) = h(x (k), k) (2.39) 


subject to the performance criterion 


N м-1 
І 2 l : 
j= ) i шаг ши у 3L (220 
к 0 Жо k =0 





We wish to choose ч (к) such that the performance criterion is a minimum: 


The minimization can be performed by calculus techniques using 


Lagrange multipliers.’ For this purpose we define the function 


N N=} 


1 
[= S lato ha ЭЕ ЕЛЬ 
y > р 900 2 зк ) 


ко \ 20 


(2.41) 
-1 
t DO Атк) н) = Lack), uk), k) l+ ACD LECO =e! 


k30 


'This approach is similar to that used by Kipiniak?* and the entire 
development of this section, including the iterative procedure, is closely 
related to the nonlinear smoothing problem treated by Cox. 
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By equating partial derivatives of I with respect to u(k), A(k), and 
x(k) to zero, we obtain the following set of equations which define the 


optimum control system. 


у (к) = К 1(к) 1” (х(К), и(К), К) А(К) (2.42) 
x (k+1) = f(x (k), u(k), k) (2.43) 
À(k-—1) = f (x (k), u(k), k) À(k) +h “Q(k) [z(k) — h(x(k), k)] (2.44) 


The boundary conditions are 
х (0) =с (2.45) 


апа 





A(N) = 0 (2.46) 


This set of equations is nonlinear, and an analytic solution is not 


known. However, we can obtain an approximate solution by using the 


linearizations 
x (k«1) 7 f(x* (k), u* (k), k) * f, (x* (k), u* (k), k) Ex (k) - x* (k)] 
= (2.47) 
+f (x° (k), u’ (k), К) 1 (К) – че (К) ) 
апа 
y (k) © h(x° (k), k) +h (x° (k), k) [x (k) - x* (k)] (2.48) 


When we use these approximations instead of equations (2. 38) and (2. 39), 


the equations for u(k), x(k*1), and A(k-1) become 


о (к) = К (К) Е" АК) (2-2) 
х(к41) ef +f [x(k) -x*(k)] +f [u(k)-u°*(k)] (2220) 
A(k-1) =£ A(k) +h; Q(k) lz(k) -h—h, [x(k) — х* (К) || (2251) 


Where f, Г, and f, are understood to be evaluated at x^ (k), u’ (k), 
and k, and h and h, are understood to be evaluated at x*(k) and k. 
We can solve the above set of equations by assuming 


A(k=1) = —P (k)x(k) - x (k) (2.52) 
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The solution proceeds by combining equations (2.49), (2.50), (2.51), and 
(2.52) to eliminate u(k), A(k), and A(k-1), obtaining 


x(k+1) 2 f € f, [x(k) - x* (k)] Ем" (к) +Е В! (к) 27 |Р (к+1) х(к+1) + х(к+1)) (2.53) 
and 
P(k)x(k) +x(k) = Е”, [P(k+1)x(k+1) + x(k+1)] —h “Q(k) Íz(k) -h —h, [z(k) - х* (К) ]} (2.53) 


These two equations can be combined to eliminate x(k+1), giving 
P (k)x(k) & x(k) - f; x(k«1) -h; Q(k) lz(k) - h - h, [x(k) — к* (к) | 
u й j (2:55) 

+ ИРИ + Г, В" (к) ЕР ^ LC fx (o) — x* ()) - £ u* (b) - f, R^! (1) (ы)! 
provided the inverse indicated exists. (Section 2.5 contains a proof that 
Ве inverse required above доев indeed exist.) 

Equation (2.55) will be satisfied for all x(k) if and only if the following 

set of equations are satisfied. 


P (k) = РОКА) 1+ ЕК РОГ f, +h Q()h, (2. 56) 


K(k) «1 х(к41) -8|0001200-с001-4 Р001(14 6,8" 00Е РОСООГ 187004 (еі) — b(k)] (2. 57) 


We are now in a position to obtain an exact solution to equations (2.42), 
(2.43), and (2.44), and hence an exact solution to the nonlinear control 
problem. The exact solution is obtained by solving equations (2.49), (2.50), 
ШЕС?), (2.56), and (2. 57) it@rativeély. 

First, we denote the state sequence and the control sequence obtained 
Geum iff iteration 5 Х (0), ..., х (М) апа 10), UNNEL), 
respectively. Then, for the itls iteration we linearize about the points 
Xx, (k) and u (k). The procedure for the itl«s, iteration is as follows: 


Step l. Solve equations (2. 56) and (2. 57) backward in time using 


x° (k) = x (k) (2. 58) 
and 

u° (k) =u (k) (2799) 
compite PN). .., P(0) ama x (N), CEH х (0). 
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Step 2. “Solve equations (2.49), (2°50), and (2 52) ои: таяп 
time using 
х* (К) =х (k) (2. 60) 
апа 
u* (k) =u (k) (2.61) 
сто, to compe u, (Om... ug N 


—1+1 


x +1 (0), 2... ха (№. 


1+1 


Steps 1 and 2 are repeated until convergence is achieved, i.e., until the 


norms of the quantities [x (K) - x (k)] and [u,,,(k) - u,(k)] are less 





than some previously specified convergence criteria. It can be seen by 
comparing equations (2.49), (2.50), and (2.51) with equations (2.42), 
(2.43), and (2.44) that if convergence is achieved using the iterative 
procedure, that is if 

х | (К) = х (k) (2.62) 
апа 

ч. (К) = (k) (2%63) 
then the solution obtaincd is the exact solution for equations (2. 42), 
(2.43), and (2.44) as well. In other words, the solution obtained by 
convergence of the iterative procedure is the exact solution to the 
optimal nonlinear control problern. The question of under what conditions 
convergence can be assured is à difficult one, and as yet has not been 
answered by the author. This remains a challenging area for possible 
future research. However, computer studies using this iteration 
procedure indicate that convergence usually occurs in a few iterations. 
Chapter V contains some of these results. 

Because the inverse in equations (2.56) and (2. 57) is generally more 
difficult to compute than the inverse occurring in the solution of the last 
section, we would prefer to use equations (2. 31) and (2. 32) as the basis 
for the iterative algorithm in lieu of equations (2.56) and (2.57). However, 
nothing we have shown thus far would permit us to do this and still 
guarantee that a convergent solution for the iterative algorithm is the 


exact solution to the nonlinear control problem. 


2.7 





We can show that the iteration scheme based on equations (2. 31) and 
(2. 32) does lead to the exact solution, and in fact is identical to the scheme 


based on equations (2.56) and (2.57) by using the following matrix identities. 
[I + f R (Kf P (k+1)] = 1-f (R(k) + Pe JO ЕР (к+1) (2.64) 
апа 


ОЕ ВЮ РОГ f R-l(k)f 


u 


= [В + РОТ" f, (2.65) 


(Appendix A contains a proof of these identities.) The application of 
identities (2. 64) and (2.65) to equations (2. 56) and (2. 57) immediately 
transforms them into equations (2. 31) апа (2. 32). In addition, since by 
equation (2.49) 
u(k) = RU (k) Fi Ak) (2. 66) 
om, thing (2.52), 
u(k) 2 -R^ (o) f? [P (k1) x(k+1) + x (k+1)) (2.67) 
and by (2. 50) 
u(k) » -R^! (19 fc EP (ce LE fi (c) — x* (9) + f, (u(k) = u* (k))] + x(k+1)} (2. 68) 
Solving this equation for u(k) explicitly yields 
u(k) = -[R(k) + MITA Е РСТ) Е f (x(k) — x (K)) - f, u* (k)] күп) (2. 69) 
which is identical to equation (2. 27). Thus we have shown the solution 


based on the equations derived in this section is identical to the 


solution based on the equations of the previous section. 
2.5 On P(k) and LI * £, R' (k) f; P (kd LI" 
= un > 


This section contains two theorems of importance to the materialin 
this chapter. The first theorem concerns the existence of ҮТТЕ ТЕН (k+1)] , 
and the second theorem concerns the non-negative definiteness oí Р (ҚК). 
The proof of these theorems will require some elementary results from 
matrix thtory. These are 


а. Ifthe n»n matrix P is non-negative definite, then the matrix 


G'PG is non-negative definite, where G is any n» r matrix. 
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b. Thej@verse of a positive definite matrix exists and is positive 
define: 

с. The sum of a positive definite matrix and a non-negative definite 
matrix is positive definite. А 


d. The sum of two non-negative definite matrices is non-negative 


definite. 


Шпеогет 1: If R(k) із positive definite, and P(ktl) is non-negative 
definite, then the inverse [I * f,R'(K) f,P(krl) exists. 


Proof: Consider the matrix expression 
L- f£, (RO) Руб V° f/P (k+) (2. 70) 


1f P (k+l) is «non-negative definite, then by a., f P(k+1)f is поп- 

negative definite. If R(k) is positive definite, üben оре си, OR (k) + Е № а 
18 positive definite, and hence by b., [ В (k) + АР (к-1) ми exists. Thus л 
the whole expression exists. But, by the first identify of section 2.4, 

" + ек ШІ (krl) is identical to the expression above and hence 


must exist. 


Шіесгет 2: If R(k) is positive defmite, and if O(k) and P(Krl) are 
non-negative definite, then P (k) is non-negative definite. 


Proof: Consider equation (2.56), rewritten here. 
P(k) = £7 P (k+l) LL * £, RC! (E f/ P (esl)]" f£, +h Q(k)h | ГОР ТІ!) 


ЕР © (К) is non-negativé definité, then by a., h (k)h, 1s non-negative 
definite. As for the first term on the right of (2. 56), it dae be non- 
Megative definite’ abso if P(krl) is non-negative definite. To show that 
this is so, let 

Еке R! (0 E/PQet] б вд (2. 72) 


then 


= [Еже КК) РОКА (2 (в) 


х 
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Thus we see that 
Е Р (к+1) (1 + f R (К) fP (k+1)] „= A [1 + f R^! (k) f£ P (k«1)] “P(k+1) А (2. 74) 


Or 


f'P(k+) [I + f R !(k)f 


P (k+1)] f = A P(k+1)A +A P(k+1)f R(k)f P (k+D A (2. 75) 


But by a., and d., the right-hand side of equation (2. 75) is non-negative 


definite. Hence for the same reason, the right-hand side of equation (2.56) 


is non-negative definite, completing the proof. 


The hypotheses of theorem 2 are satisfied by the original assumptions 
oí the problem statement. The hypotheses of theorem l are satisfied by 
the original assumptions in the problem statement, and by the results of 


théorem 2. Thus theorem 1 applies to equation (2. 55) in section 2.4. 
2.6 An Altegwnative Linearization Procedure 


There are other possible linearization procedures that can be applied 
to the nonlinear control problem. One procedure suggested by Pearson?” 
has the advantage of being computationally simpler than the methods of 
sections 2.3 and 2.4, but it is theoretically less attractive. 

To present the theory for this method, we follow the approach used in 
section 2.3. However, instead of the linearization used there, we use the 
following linearizations. 


x (k41) € F (x* (k), u* (k), k) x(k) * G(x* (k), u* (k), k) u(k) (2715) 
у (к) = H(x* (k), k) x(k) (2.77) 


where F and G are determined such that 


£(x(k), u(k), k) + F (x(k), u(k), k) x(k) + G(x(k), u(k), k) u(k) (2. 78) 
h(x(k), k) = Н(х(К), К) х(К) (2. 79) 


This type oí lineariz8'tion is not unique, and itis an open айвзЦцоп аз го 
which linearization oí this type is best. However, in many instances 


there is an obvious intuitively appealing way to proceed. 
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As an example of such a linearization, consider the scalar nonlinear 


function 


f(x (k), u(k), k) = —x3(k) + Vu(k) (2. 80) 
One possible linearization is 
E EE e) ан ак) шве) сэ 


Another one, arbitrarily chosen, is 


f(x (k), u(k), k) З |-х'Хк) - u" (k)] x(k) + [u (k) + x*(k)] u(k) (2. 82) 


The first, of course, is intuitively more appealing. 
By using the linearizations outlined above instead of equations (2. 23) 


and (2. 24), equation (2. 26) becomes 


1 2 А Min \! 2 ] 2 
=> " 4 = -— => = k 
2 al, a * x (k)x(k) * a(k) и (к) Б ЇЇНЭ у 0911 со) р 2 Пи е 


7 (2. 83) 


2 


l 
+ 5 або «А a) 


+ [F x(k) + Gu(k)] x(k+1) + a(k+1) 


(When the arguments of F, С, and H Be omitted, they are understood 
to be evaluated at the points x (k), u° (k), ааа. 
The minimizing value of u(k) is 


u(k)=-[R(k) +G P (k+1) G] G [P (k+1) E x(k) + x (ke )] (2. 84) 
When this value of u(k) is substituted into equation (2. 83), we get 


l 2 2 ^ l 2 
5 | wi, + x (k)x(k) + a(k) = 5 ll z(k) — Û x(k) И пос) 


Ed NN Й 212 (ето 
2 Girata’ Paene] с 


l , a 
+ = || F x(k) |] tx (kK) F< (k+l) + a(kel) 
2 D (k + 1) 


This equation will be satisfied for all x(k) if and only if the following 


equations are satisfied. 


P (k) =H Q(k)H + E7M(k) P (k+l) F (2. 86) 
x(k) « E'M(k)x(k«1) - H'Q(k) z(k) (2. 87) 
ТЕЛІ МЕ ТЫ 1 (2. 88) 
A T s lego + o pre als | 


3 | 





where 


M(k) =i — P(k+1) G[ R (k) + G °P (k+1) G J" G“ (2. 89) 


The boundary conditions are again 


P(N+1) =0 (2. 90) 
x (N+1) =0 (са) 
a (N+1) = 0 (2:92) 


As can be seen, these equations are identical in form to the solution 
equations for the linear system. The only difference is that the matrices 
F, G, and H in this section are functions of x'(k) and u'(k) as well 
аз oí k. 

An iterative type solution, similar to that introduced in section 2.4 
is possible here also. However, we cannot show that this iterative solution 
converges to the exact optimal nonlinear solutions. The reason for this 
can be seen by comparing equation (2.44) of the exact optimal nonlinear 
solution, rewritten here, 


A(k-1) = f (xz(k), u(k), k) A(k) +b 7 Q(k) Cz(k) - В (х(к), ю) (2. 44) 


with the equation corresponding to equation (2.51) when the approximations 


of this section are used. This equation would be 
Alk-1) = E A(k) + UW’ Q(k) Lz (k) — Ux(k)) (2. 93) 


It is obvious that equation (2. 93) will not approach equation (2.44) as 

x(k) approaches x(k). Thus the convergent solution of the iteration 
procedure based on the equations of this section will not in general be 
the exact optimal solution. We could only hope that this solution would 


be very near the true optimum. 
2.7 Nonlinear Systems with Stochastic Disturbances 


This section presents a technique for controlling a nonlinear system 
that is subject to stochastic disturbances. Such a system can be described 


by the equations 
x (+1) = £(x(k), u(k), k) + £(k) х(0)-с (2. 94) 
y (k) = h (x (k), k) (2. 95) 


за 





Where r(k) isan n-dimensional random vector such that’ r(j) 15 
Insependent of Ww" TOT.) ж. Wis rR) is essentially а wa 
time equivalent of white noise. 

If the nonlinear system we are interested in controlling is disturbed 
by a random input that is not independent as described above, but instead 
is disturbed by a random vector that can be described by the difference 
equation 


L(k) = ф(г(К), К) + (К); = (0) = w (2. 96) 


where w (k) is an independent random sequence, then the system 
equations can be transformed into the form of (2. 94) and (2. 95) by 
augmenting the state variables. This can best be illustrated by a simple 
example. 
Suppose the system is described by the equations 
x (k+1) = x(k) u(k) r (k) (2.97) 
and 


r(k+1) = ar(k) + w (k) (2. 98) 


where w(k) is an independent scalar random variable. We can define an 


x (k) x (k) 
x(k) = = (5530! 
х (К) r (k) 


and write the system equations as 


augmented state vector 


x(k+1) = f(x (k), u(k), k) + r(k) (29900) 
where 
[x (k) u(k) x(k) 
f(x(k), u(k), k) = | (22101) 
| ах (К) 
апа 


0 
r(k) = | f (2162) 
w (k) 


which is in the form of equation (2. 94). 
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Because te variables involved in equations (20%) and (2.95) are 
stochastic, a reasonable performance index will involve an expectation, 


Thus we assume the performance index is 


J (k) e s Е УЛ S^ 2 ГОЛ a | E 
А s 21)) — V UJ - = u (J : 
10 ),...,:0-) 12 2 = оо) 2 ROY ( ) 


In order to proceed by dynamic programming, we define the value function 


Min 
Veo (x(k) = MOL (2. 104) 
BOE), ..., &(N—]) 


Bellman? shows that when the r(k) sequence is independent, the 
principle of optimality implies 


Min Exp V : 


Vg SV OO) = ois 


l| z(k) - y (k) || + lu ll; нин (2. 105) 


u(k) r(k) |2 EO 


As before, if we assume 


x(k«1) 7 ff. [x(k) - x* ()] + f, [u(k) — м* (К)] + (к) (2. 106) 
y(ky- h+h [x(k)- x° (k)| (2. 107) 

and | 4 4 
وو‎ -2 Ill, u +x’ (k) x(k) +a (k) (2. 108) 


then we obtain 
Min Exp | 


1 2 - 1 2 
— lix E с c) = - | 2 - - К 
> Иж) И) t z (O x(k) +a (k) 0 І а (ю) ОН пы) + ы t 


2 
(2. 109) 
+ - If, x(k) + b(k) + £(k)| A + ГЕ, x(k) + f, u(k) + D(k) + r(k)] x(k+1) + "m 
where 
b(k) = Е - Ех” (К) — f u* (k) (2. 110) 
Performing the expectation operation and then the minimization operation 
Е 
yields, assuming 2 |г(к) #0, 
r 
u(k)=-—[R(k) +f'P(k+I)f UE CLP (RD) Е. x(k) + P (k+1) b(k) + x(k+1)| (ЛІГІ? 


and 
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1 2 , - 1 2 Exp г 
5 || x(k) к нк”) Е (к) +а(к) #2 Ilz(k) -h,x(k) - e (k) PP Эр, [r (k)P(k+1)r(k)] + a(k+1) 
1 - 2 
Е З I] P (k+1) f, x(k) + P(k+1) b(k) + x (k+l) N, sa Г! с (E. TIZ) 
1 m 
+ >If, 20) + bw) „И * Uf, x(k) + b(k)] ^ x (ed) 
where 
c(k) -h -h, x* (k) Сат 13) 


This equation will be satisfied for all x(k) if and only if the following 


equations are satisfied: 


P(k) -h/Q(k)h, «f M(k) P(kel) f, (2,11%) 
x(k) = f M(k) [P (k+1) + h(k) + x(k+1)1 —h/ Q(k) [z(k) — e (k)| (23115) 


ж 


” хр 
Баг. (1) T EE TO =- bay? + b'(k)x(k+1) + | г” (К)Р (к+1) r(k)} 
2 Qk) 2 P (k + 1) ) 


= 


(2.116) 
- L ||P (k+1)b(k) + k (k+1) ||? 
کر‎ b TE ) t [Rar Pane ф 
where 
M(k) =1-P(k+1) £, UR (ke) £^ P (kel) f] f7 oma) 
The boundary values arc 
P (N41) =0 (2201818 ) 
х(№+1) = 0 (2.119) 
a(N+1) = 0 (2. 120) 


These equations are identical to equations (2. 31) through (2. 37) except for 
the additional expectation term in equation (2. 116). 

In éssence, theseé equations are the solution to the optimal control 
problem for thé linearized system. This solution différs from the exact 
optimal nonlinear solution because the linearized system only approximates 
the nonlinear system. In section 2.4 we were able to improve this approxi- 
mation by an iterative technique so that eventually the exact solution was 


obtained. What are the prospects of a similar procedure in this case? 
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An examination of the iterative procedure of section 2.4 reveals that 
the technique was dependent on being able to predict exactly the state at 
time k+l which results from the application of a known control signal to 
the system in a known state at time k. Unfortunately, because of the 
random disturbance, r(k), this is impossible for the system considered 
in this section. 

We can, however, use the following iterative algorithm to obtain an 
approximate solution: 

Step 1. Solve equations (2.114) and (2. 115) backward in time using 
банці) (2.121) 
u° (j) =u (j) (220) 

to compute P(N),..., P(k) and x(N),..., x(k). 


t 


Step 2. Solve equations (2. 111) and (2. 106) forward in time with 


1002790; j2k (2. 123) 

and again using 
x° (j) = x (j) (2. 124) 
u" (D) = u (i) (2. 125) 


to compute и,,(Ю...., ас 1) and х (Mk), 5 DI ХЕ 


Тре itlst iteration would then proceed using the extrapolated control 
vectors, ч.) and the extrapolated state vectors, х5 Just 
computed in place of u'(j) and x'(j. The procedure would be repeated 
until satisfactory convergence had been achieved. 

This algorithm should provide nearly optimal performance when the 
РО) апа х (0) obtained in this fashion are used to generate the control 
fof thë real system. AS tiie gocs on, and the true statë dëviates more 
and more from the extrapolated state, the performance will slowly bc 
degraded. 

One way to overcome partially this degradation of performance is to 
update the solution periodically by measuring the current state of the 


system, and then using this state as the starting point for a recomputation 








of P(j) and x (j), using the same iterative procedure as before. Of 
course, this would require that the iterative algorithm be executed in 
much faster time than the real system evolves. 

Some computer results using this approach are presented in 


Chapter V. 
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CHAPTER iil 


CONTINUOUS TIME SYSTEMS 


3.1 Introduction 


Even a cursory examination of the results oí Chapter II shows that 
the control systems required by the theory are of such complexity that a 
high speed digital computer will generally be required to investigate or 
to synthesize the control system. However, for the few analog control 
system applications that may be possible, and for a few special nonlinear 
control problems that can be solved analytically, a continuous time theory 
15 required. 

The purpose of this chapter is to develop the theory for the control 
of continuous time nonlinear systems. This theory is developed ina 
manner analogous to that used in Chapter II for the discrete time systems. 
It should be mentioned here that Kalman? and Merriam?!5/!$5 have developed 


the theory for linear continuous time systems. 


3.2 Liméar Systems 


Consider the linear control system described by the equations 
x(0-E(0x(0*G(gu(t); х(0) -с (S) 
y(t) = H(t) x(t) (3. 2) 


where x(t) is the n-dimensiona] system state vector, u(t) is the 
r-dimensional control or input vector, and y(t) is the m-dimensional 
systern output vector. As indicated by the notation, the transformation 
matrices Edf), G(t) and H (t) as well as the vectors x(t), u(t), and 
y (t) can vary continuously with time. For this system we wish to find 
the Ccomtrol u(™) оп thi® mterval t<7 <T such that the performance 


index 


T 
1 2 J 2 
нө. | a ry +R) 47 (3.3) 


is a minimum. Here, z(7) is the desired output of the system. 


3% 








We define the value function 
Min 
V(x(0,0= u(7) (| (3.4) 
t <7 < T 
By the principle of optimality, we have 
Min c+ t 
V(x (t), t) = u (7) 1! pleo -yon ¿am er ntc (2-2) 
wer ши t : Я a d. \ 
If we expand V (x (ttAt), ttAt) in a Taylor series about the point 
(x(t), t], we get 
Min c+ Ac 
Vix(t),t)= u(T) ií ВЕЕ += {eI jer 
Р = | Du = QUE ат R(T) 


(247 0404 t 


(350) 


+ V(x(O),t) + V (x (t), Ó At + V (x (t), с) [x(t- Ac) — x(t)] + ы 


When we take the limit as At approaches zero (provided it exists, etc.), 


equation (3.6) becomes 


Min 
] 2 ] 2 
V - - = 53 - 
Ч. ий : Й 2 (9 yall, Е aco T, m о 0 (3.7) 
ог 
Min ] : | ë | I 

M st = 1 || 24) зү) m + 5 | жаы й - КЕГЕ ТЕ) v 22 =0 (3.8) 

The minimization can be performed by ordinary methods oí calculus 
ylelding 


о m -R OG W V, (952%) 


Substituting this value of u(t) into equation (3.8) yields the Hamilton- 
Jacobi equation, 


1 2 l , 2 г = 
У + z0 — H(t) x(t) Hoy 5 ING 7”, УГЕ(9х(9--0 (3.10) 


The solution for this equation can be obtained by assuming 


V (x (t), t) -> ОИ oy + x (OD x(t) + a(t) See) 
Hence, 
X. ООО, (22222) 
and 
Vo =P(0x(0) + x(0) (2.19) 
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After substituting these expressions into equation (3. 10), we obtain 


z EO y tE OLO «a0 7 lla(o - ORG) 
о (3. 14) 


-i|P(QOzx(QO-E(9* a ou +[P(Oxz(O +x(0] F(ox(O =0 
2 GR (ос (9 





This equation can be satisfied for all x(t) if and only if the following 


equations are satisfied: 


PO =POSOR (OS OP -POEO-E (0 P () - H'(0 Q(OH (0) (3215) 
x() е [P(OG(OR"! () G^(9 = EF’ (D] (0) + НА (0) 000) 200 (3. 16) 


2 
G (t) R^! (t) G'(t) 


2 


АС 
^ (3. 17) 


Í ,- 1 
a (t) =z ile% |l л 


The boundary conditions for these equations can be obtained from 


equations (3.3) and (3.11). They are 


Р(Т) «0 (3. 18) 
х(Т) #0 (3.19) 
есте о (3.20) 


Here again, these equations must be solved backwards in time, but they 
do not depend on the state oí the system. Therefore, they can be pre- 
computed, as in the discrete time case if the desired output, 2 (т), 15 
known on the interval t - 7 « T. The control can be realized in the form 


of the block diagram shown in figure 3. l. 





Figure 3. ] - Continuous Time Opurmal Linear Control System 
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3.3 Nonlinear Systems 


The nonlinear systems we consider here can be described by the 
equations 
k(t)e f(x(O,u(t) t); 9 (0) =e (3.21) 
y(t) =h(x(t), t) (32222) 


where x(t), u(t), and y(t) are state, control, and output vectors, as 
before, and where f(x (t), u(t), t) and h(x(t), t) are continuous time 
vector valued functions. It is necessary to assume that f and h 
satisfy certain differentiability conditions in what follows. Whenever 
derivatives of these functions appear, we will tacitly assume that they 
exist. 

For the system just described, we wish to find the control, u(r), 
on the interval, t < 7T «T, such that the performance index 


| 
" : 3 im си ; 3.23 
нө» | Т нэ Le ПРУТИ (3. 23) 


15 а minimum. 


We define the value function 
Min 
У(х(0,0- (т) ÍJ(91 (3. ФЕ) 


eu CN 


Then by the principle of optimality, 
Min 


Bt 
T : 2 ] 2 
ins в) и | аст) усто? + ато |" 
|J 2 що? E (8255 


(57 Ct Ае 
* V(x(t- At), t+ Vt) ( 
By expanding V (x(t*At), t*At) in a Taylor series about x(t) and t, 
and then taking the limit as At approaches 0, we get 


Min | : 
шт zaw -yolg 


l 2 ar 
^ ull оао = (3720) 


(t) 


Since the system is nonlinear, we cannot solve this Hamilton-Jacobi 
Equation directly in gemeral. So, asin Chapter II, we resort to 


Linearization. We use the approximations 


4 | 





x(t) 7 f(x*(0,u* (0, 0 + Е (хе (0), ше (0), 0) Ск (0) - x*(0] * f, (x* (0,u* (0,0 [u() -u*(9] (3. 27) 
and 


y(t) = h(x* (09,0) +h_(x* (0), 0) [x(0) - x*(9) (3. 28) 


With these approximations, equation (3. 26) becomes 


es ИЕ УИ късо 2 жукте Ux(o -а (е)! 
аю |2 à Y Q(t) 2 е Bo ЕЗ to. Е (3, 29) 
4“ Е, още ча: 
The minimization operation yields 
ugs 7 7E ӨЧ У, PENA 
and 
1 2 1 2 : 
Y O Bo O II жинс Ri (1,500 4в(01-0 (3. 31) 
where , 
b(t) = Ех" (9) – Еше (0) (3.32) 
апа 
c()=h-h_x*(t) (32235) 


A solution for equation (3. 31) can be obtained by assuming 


У(х (с), с) = - || x (t) Ils o + x‘(t)x(t) + a(t) (3. 34) 
which implies 
vato хо +x (Dk + àle) (3385) 
2 P(t) 
and 
V (x(t), t) = P(t) x(t) + x(t) (3. 36) 


Combining equations (3.31), (3.35), and (3. 36) yields 


2 


Й ж (с) a + x’ (t) x(t) + Alt) + - z(t) —h x(t) = el) Now - > ||P (e) x(t) + x(t) m y 


é 


l 
2 


+(P(e)x(t) +x(t)) Lf, x(0 + Ъ(01 = 0 (3. 37) 
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This equation will be satisfied for all x(t) if and only if the following 
equations are satisfied: 


P(t) - P(0f, R' (Qf; P(t) - P(Of, - £2P(t) -hzQ(oh. (3. 38) 


u 


x(t) = (P(e) f RU (c)f/— £0) x(t) — P(t) b(t) -820(0 (2(9-с0) 02:22: 


2 


š ТЕ 
al, ar 


zz - coll, e ОО (3.40) 


The boundary conditions are the same as for the linear case. 

If we are given x (t) and u' (t), we can compute P(t) and x (t) in 
advance. Then these parameters can be used to determine a near optimum 
control for the system. Of course, how near optimal the control system 
is depends on how good the approximations (3. 27) and (3. 28) are. 
Computationally, we can proceed in a manner analogous to the discrete 


tima iterative procedure. To do this, we can use x(t) and u(t) 


determined by the iib iteration as x'(t) and u' (t) for the itlst iteration. 


Similar to the iterative algorithm of section 2. 4, this algorithm can be 
shown to yield the exact solution to the continuous time nonlinear control 
problem. 

The control system can be synthesized in the form of the block 
diagram of figure 3.2. As can be seen from figure 3.2, the continuous 
time control system is almost identical in form to the discrete time 
nonlinear control system. 

Some additional insight into the problem of optimal control can be 
gained by examining the nature of the equations for P(t) апа x (t). As 
the quantity, T-t, approaches zero, T (t) and x (t) approach zero. 
Hence, the optimum control signal approaches zero as the terminal time 
nears. On the other hand, when T-t is very large, and the system 
being controlled is linear time invariant, Р (t) is very small. We would 
expect that when T-t is very large, and when the time variations and 
nonlinearities of the system being controlled are not severe, P (t) 
should be small, also. The director part oí the input, x (t), is derived 
from the desired output, z(t), by the feedback system shown in 


eure 3.3. 
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ЖІ x Wc a + x(t) x(t) 


PODER NE E 


Figure 3.3 - Block Diagram of System for х (1) 


If the output of this system follows the input reasonably well, the 
system synthesized using e (t) z (t) in place of x (t) might perform 
near optimally, provided b(t) and c(t) are reasonably small in 
magnitude. 

The comments above have been imprecise, and were meant only 
to convey some insight into the problem beyond the bare mathematical 


Statements. 
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CHAPTER D 
CONSERVATIV E SYSTEMS 
4.1 Introduction 


А special class of nonlinear systems which we shall call ''conservative,'' 
can be treated analytically and exactly by the methods introduced in 
Chapters II and III. The purpose of this chapter is to study this class 
of nonlinear problems by means of two examples. Often, as much can 
be learned from the study of one analytic example as from a hundred 


numerical examples. 
4.2 General 


Consider the nonlinear system 


х = {(х) +0;  x(0)-c (2.9) 
If the performance criterion is 
L f L (x.y) at (4. 2) 
then the loss equation, equation (3. 26), #5 
pa [L(x.u) +V, f(x) +V/ ul «0 (45003) 


u(t) 
If the term, vs (x), in equation (4.3) vanishes identically for all х, 


it is possible for a great simplification to result. Of course V, and 
hence V , depend strongly on the form of L(x, u). Thus Vi f(x} will 
vanish only if L (x, u) has a special form. Fortunately, this is sometimes 
the case in practical problems. The example problems which follow will 
serve to illustrate the nature of the special form L(x, u) must have to 
permit this simplification. In addition, the example problems will permit 


us to study the analytic solutions of some optimal nonlinear control 


problems, and compare them with some sub-optimal solutions. 





4.3 Spinning Body Problem 


The equations of motion for the angular velocities of a freely spinning 


body about three mutually perpendicular axes can be written as 


х, захо у х (0) єс, 
X, = ax x ;; x(O)=c, 22 
x =ax x); нэгч 
where x,, x,, and x, are the angular velocities, and where 
а. за, ча, «0 (4.4 A) 


1 2 3 
These equations of motions are coupled and nonlinear. If we wish to 
control the spin of this system by exerting torques about each of the 


three axes, the equations of motion become 


x Sa xox, tus: х (0) єс, 
X= AX x uU, — 4. 5) 
Xy SA, ر‎ х (0) ес, 

where u,, U,» and u, are the control variables proportional to the 


torques. 
If we wish to reduce the angular velocities to a minimum, subject 
to a constraint of the control effort expended, an appropriate performance 


criterion might be 
4 ru MS. Ba ТЕГТІ. | B t а 6 
] Ғы х хон ад имаща. t (4.6) 


Optimal Control 


The control which minimizes J can be found by the method of 
Chapter III. The loss equation is 


4 2 2825. adi l 2 и ма? 
хол ШЫҒЫН ыы 


- 


(4.7) 


x x q V ФВ +V и «М Би 2077. 
1 2 13 ж. 8 4 2 х, 1 х, 2 x, 3 


'The spinning body control problem has been treated by Athans" 
and Windeknecht,?? but their methods differ from that used here in 
significant respects. 
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If we assume 


V == plo Ixbexpexi] (4. 9) 
then the optimal control is 
u, = -К(0)х, 
u,=-—k(t)x, (4. 9) 
u. = k(t) x 
where й 
k(t) = p(t)/r(t), (4. 10) 
and equation (4. 7) becomes 
21809 -р 0/0 + 4001 É =0 (4. 11) 


ІШЕ йіпсе this equation muwt be true for adl X, X, and x, We must 


4 
have 
p(t) —p7(t)/r(t) + q(t) = 0 (4. 12) 


From the definition of V, the boundary condition is 


p(T) =0 (4. 13) 


If q and r are constant, the solution for equation (4. 12) is 


sene ers | (2. 1%) 
Lm 
where 
qm quer (4. 15) 
and 
T = T] ct (4. 16) 


M plot of p(7) is shown in figure 4.1. 





Figure 4,1 - Plot of p(r)/ra versus ar 
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Notice that the optimal controller is linear with time varying gains 
even though the system controlled is nonlinear. Also notice that the time 
varying gains reach 76 percent oftheir steady-state value in 7- 1/a 
seconds, 95 percent in 7- 2/a seconds, and 99.5 percent in 
+ = 3/a seconds. As is evident the quantity, 1/а, plays the role of 
a time constant. 

The controller may be realized in the form of the block diagram of 


LA 


SPINNING BODY 





Figure 4.2 - Spinning Body Control System Block Diagram 


Sub-optimal Control 


It is instructive to compare the optimal control system of the last 
section with the sub-optimal control system which simply uses constant 
gains. In order to make this comparison, the performance criterion 
must be computed for the optimal and sub-optimal controls on the time 
interval (0, TJ. 

Бог іле optimal control thé performance crit€fion is 


J° = V(c, 0) (4.17) 


or, for this problem, 


в рос 2 2 2 
Ан as jes + січі „ш 
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For the sub-optimal control with 


u, = -kx, | 
u, = -Ех, (4. 


u, = Kx 


the performance criterion is 
T 





19) 


1 
J = 4 +a] f БЕНЕН dt (4. 20) 
or | T 
даг 2 , 
J в + ча) W(t) dt (4. 21) 
where 
W(t) = x (t) + х (0 + x (1) (4. 22.) 
It is possible to compute W(t) from equation (4.5) іп the following 
manner: 
EX, =a xxx, -kx: | 
xk maQx رکو کر‎ kx (4. 23) 
x Xm Aux xx, c kx; | 
Adding, we ре! 
АХ le] (4. 24) 
or 
W (t) à 2kW(t) -0 Ж(0)-с14с24с: | (229025) 
The solution of equation (4. 25) is 
W(t) - 625 (0) (4.26) 
From this the sub-optimal performance criterion may be computed, This 
gives 
lq ша -2kT 2 2 2 
J = Ше т (4.27) 
For k=a = Vq/r, J becomes 
J ° АЕ БЕН (1. 28) 


4. 
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The ratio, ШАТ, is then simply 


Р - ӨВӨЛ (4. 29) 


A plot of J/J° is shown in figure 4.3. 
АЁ 


2.0 | 





0.0 1.0 2.0 3.0 
Figure 4.3 - J/]° versus a T 


al 


The maximum value of UA is 2 when T, the control interval, is 
infinitesimal, and the ratio tends to unity as T increases. In fact, when 
T is just 1/а Seconds, the ratio is only l. 13. 

To get an indication of the sensitivity of the performance index, we can 
compute the sub-optimal control with k = 1 а апа compare the results with 


those for К = а. 


The value of the performance index for k = 


| Л 
t-ra 
: 4 


T = п-т |а несе (бе 20) 


"З I 


a is given by 











or, Since a = V а/г я 


1 2 3 


ре таре" ей ве? EH (4.31) 
B 
For this case, the ratio, мат 15 


М’ = (4.32) 


[1 E] m [1 РА p] 
| 


| - = | 


A plot of this ratio is also shown in figure 4. 3. 


5 | 





We can see from figure 4.3 that the constant gain sub-optimal control 
provides a nearly optimal system. The gain setting with k=a would be 
better if the control interval is much greater than 1/а апа К = 5 а 
would be better if the control interval is much less than l/a. In any case 
the system 1s relatively insensitive to variations in the gain setting, and 
this is the reason that the optimal control system 1s little better than the 


constant gain sub-optimal systems. 
Terminal Control 


If we desire to reduce the angular velocities of the spinning body to 
a minimum at the terminal time only, subject to a constraint on the 


control effort expended, an appropriate performance criterion might Бе 
«c 


1 
J ü ўй за +3 (1) +2) +f > СЕНС (4. 33) 


The results of the sub-section on optimal control apply directly to 
this problem if we let 


q(t) = qu, (t-T) (4. 34) 


where u (t) is the unit impulse function. 
The equation for p(t) then is 


P(0 -p?^(0/r 20; p(T) =q (4. 35) 


The solution of equation (4. 35) is 





p(t) = (4. 36) 
Г +ат 
wherc again 
T=-T-t (4.37) 
and 
a=q/t (4. 38) 
Thus the optimal value of the performance index is 
48 (0) (4. 39) 





1 
2 | «аТт 








Mm plotof p(t) із Shown figure 4. 4. 


p (t)/q 


0.5 


0.0 1.0 2.0 3.0 4.0 ат 


Figure 4.4 - Plot of p(t)/q versus ar 


Again, itis interesting to compare the optimal controller with 
a sub-optimal constant gain linear controller. In terms of the 
constant gain, k, the performance criterion for the sub-optimal 


controller is 
J = > qW (0) ы -E ket, q + T2 (4. 40) 


If thé gain, k, is set equalto a, the sub-optimal performance 
criterion, J, approachés thé optintal períormancow criTtefion, DN ; 
for véry short control intervals. In this case, the ratio, TUN 


ls 


МИ - та + ат) (е287 +0) (4.41) 


A plot of J 1$ shown in figure 4.5. 
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0.0 1.0 2.0 3.0 4.0 aT 


Figure 4. 5 - Performance Ratio, J/J*, for Terminal Control 
It should be noted that the value for k in this example was chosen 
to give near optimal performance over relatively short control intervals. 


Better performance could be achieved over longer control intervals with 


a lower gain setting. For instance if k = > a, the value of the performance 


criterion is 


1 l 3 які 
«—qW(0)| =+ = (4.42) 
J EM ( 124 с | 
The ratio, M3, then is 
TAL - 0 Шат =з) (4. 43) 


A plot of this is shown in figure 4.5 also. A plot is also shown for 


ка 


La 
ра. 

As can be seen from the plot, there exists a constant gain for any 
particular value of control interval which will give very nearly optimal 


performance. For instance, with a control interval of l/a, k= 


м | 


а 
will give a performance index of about 1.05 times the true optimal 


performance index. 
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4,4 Nonlinear Spring Problem 


The equation of motion for a mass attached to a cubic spring can be 


written as 


+13 = 0 (4.44) 


er it control 18 exerted, i. e., the pystem iş førced, the equation ig 


x+x3 =u (В. 95) 


This equation can be written as the system of first order equations 


X + 
x =x +u, | 


| (4.46) 
X, ==x +8, 
where 
x, = x (4.47) 
апа 
u=ü,+u, (4.48) 


The state variable, x,, 1s not as easily identified with the original 
System variables, but this is of littlé consequence. 


Suppose that we wish to control the system (4.46) such that 


A 
| | | 
J -| (ер “(езе dt (4.49) 


is a minimum. The loss equation for this system is 


Min 
VA la 22 l 2 : 3 
p L. qua) (ара (evo tm ty res «0 (4.50) 


If we assume 
] ] 
бөө 0505) (4.51) 
then the optimal control ıs 


u, тур Ино | 
(4.52) 


Ч „ма —p(t)x,/r(¢) \ 


and equation (4.50) becomes 


РО ро но) ня | ao (4.53) 


DO 





Since this equation must be satisfied for all values of x, and x,, we 


must have 


p(t) — p2(t)/r(t) + q(t) =0 (4. 54) 


The boundary condition is 


(Т) <0 (4.55) 


This equation is identical to equation (4.12), and the results of 
section 4.3 of this chapter, including the sub-optimal control results, 
are equally applicable to this problem. 


Since 


usu +u (4. 56) 


2 


the control, u, may be expressed as 


u SAS - p(t) x , /r(t) (2657) 


For the actual synthesis of the controller, however, this expression 


for u is unsatisfactory because the state variable, x has not been 


2? 
identified with the original system variables. We can get around this 


by expressing x, in terms of x, and u,, thus 


-и, (4.58)‏ رة 
Or‏ 
rji ET p()x, /r(0 (4. 59)‏ 
The control, u, then is‏ 
u e 3p(O 3, /2r() € 2 p^ (0x ,/r* (0) (4. 60)‏ 


The block diagram for this control system is shown in figure 4. 6. 
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CONTROLER NONLINEAR SPRING SYSTEM 
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Figure 4.6 - Nonlinear Spring Control System Block Diagram 


Admittedly, the nonlinear systems and the performance criteria 
used in this example problem and the previous one are very special. 
However, because we are able to obtain analytical solutions, a great 
deal of insight can be gained from them about the nature and behavior 
of optimal controllers in nonlinear systems. In particular, we have 
found that simple constant gain linear controllers can provide very 


near optimal performance over a wide range of conditions. 
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CHAPTER Е 
COMPUTER RESULTS 
5.1 Introduction 


The results of several computer problems illustrating the methods 


of Chapter II are presented in this chapter. Several variations of each 





problem are presented in order to ehow the effect of chagges mm the 
initial state and changes in the performance index. It should be borne 
in mind that since the system being controlled is nonlinear, the 
controller parameters depend on the initial state of the System. 

In addition, the results of controlling some of the nonlinear systems 
with simple sub-optimal linear controllers are presented and compared 
with the optimal results. 

The results of this section were obtained on the IBM 7090 computer 
ae thé MIT computation centér. The Fortran programs used"to obtain 
the solutions for the two state-variable deterministic problems are 
given in Appendix D. In all cases, the change in the performance index 
from one iteration to the next was used as a convergence criterion. 
When the magnitude of this change was less than one per cent of the 


valué of the performance index, the iteration procedure was terminated. 
5.2 One State-Variable Example 


The system considered for this example can be described by the 
equations 


x (k+1) = x(k) – 0.05 х (к) +0.05u(k); x(l)=c (5.1) 
y (k) = x (k) (62) 


The system may be thought of as the discrete time approximation of the 
continuous time system 
x(t) = -x° (t)+ u(t); — x(0)=c (222: 
y(t) = x (t) (5.4) 
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The performance index used was 


100 99 
1 2 1 
be = OM k) — x(k —- R(k)u?(k 5.5 
J ds; S x(k)] + ЭЕ (К) и“ (К) ( ) 


The equations used as the basis of the iterative procedure for this problem 
may be determined from equations emo (2727 (270 ГЕ 
The sub-optimal system used is given by the same equations except 
that u(k) is given by 
u(k) = G(k) [ z (k) - х(к)! (5. 6) 


where G isa constant gain íactor. Block diagrams of the optimal and 


the sub-optimal control systems are given in figure 5. 1. 





[R(k) + 2 P(k+ F! f 


P (k+) h, 






Opumal Control 5ystem 





x (k) - 0.05 x? (k) x (k+1) UNIT 


+ 0.05 u(k) E DELAY й 





sub-optumal Control System 


Figure 5. 1 - One State-Variable Control Systems 
Figure®5. 2 through 5. 10 give thesplotted results from several data sets 


for this system. Comments on each of the figures are given below. 
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Figure 5.2: For this data set, R(k) = 0.01, Q(k) = 1.00, x(1) = 1.00, 
and z(k) = 0.0. Convergence was achieved in three iterations. The 
linear sub-optimal control system with a gain equal to 7.5 gave a 


performance index of 1.1951, just 0.1 per cent higher than the optimal. 


Figure 5.3: For this data set, R(k), Q(K), x(1), and z(k) are the 





same as for thc previous data set except that z(k) = 1.0 Гог К> 50. 
Convergence occurred in three iterations. The plot clearly shows 

that u(k) anticipates the step in z(k) indicating the sense in which 

this control system is ''unrealizable.'' The sub-optimal control 

system, which is non-anticipative, with a gain of 7.5 had a performance 


index of 3.238, about 30 per cent higher than the anticipative optimal 





system. 


Figure 5.4: For this data set, R(k) = 0.01, Q(k) = 10.0, x(1) * 1.0, 
and z(k)-0.0. Convergence was achieved in three iterations. Notice 
that since output error is relatively more important in this case, the 
control effort used is higher and the system response is faster. The 
sub-optimal control for this set had a gain of 15.0 and gave a perform- 


ance index of 6.386, about 0. l per cent higher than the optimal. 


Figure 5.5: For this data set, R(k) = 0.01, Q(k) = 10.0, x(1) = 1.0, 
and z(k) = 0.0 for k < 50, but z(k) = 1.0 for k > 50. Convergence 
was achieved in two iterations. With a gain oí 15.0, the sub-optimal 
system gave a performance index of 13.845, which is 13 per cent 
higher than the performance index for the optimal system. When the 
control system response is relatively fast, as in this case, anticipation 
of the optimal system does not improve the system performance as 


much. 
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Figure 5.6: For this data set, ВИК) - 0.01, ОЧК) = 1070, x(1)= 10.0, 

and z(k) = 0.0. When the initial state is as large as it is in this case, 

the system is open loop unstable. (The continuous time system, x +x3= 0, 
is always stable, but the sampling introduced to make the discrete time 
approximation causes the system to be unstable for |x(1)] greater than 
about 6.0. The closed loop control system is, nevertheless, stable, at 
the expense of a very large performance index. Because the discrete 

time system is unstable, itis not a good representation of the continuous 
time system for this case. For this reason figures 5.7 and 5.8 have 


been included, 


Figure 5.7: For this data set, the sampling interval has been decreased 
by a factor of 10 and the number of steps has been increased by a factor 
of 10. This makes the system open loop stable, and once again a 
reasonable discrete time approximation to the continuous time system. 
Here R(k) == 0.01, Q(k) = 1.0, x(1) = 10.0, and z(k) = 0.0. Convergence 


occurred in six iterations. 


Figure 5.8: For this data set, the comments of the previous set apply 


except that Q(k) = 10.0. Convergence occurred in four iterations. 


Figure 5.9: For this data set, R(k) = 1.0, Q(k) = 1.0, x(1) = 1.0, 

and z(k) = 0.0. Convergence occurred in three iterations. Because 

the cost of control is so high relative to the cost of output error, the 
control cffort expended is small and the system response is slow. As 

a matter of fact, it can be shown that for the one state-variable system 
the speed of response is proportional to the ratio, Q(k)/R(k). In general, 
we can expect the spced of response to depend on the ratio of the norm 


of the Q (k) matrix to the R(k) matrix. 
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Figure 5.10: The data for this set is the same as for the last set except 
that z(k) = 0.0 for k < 50 and z(k) = 1.0 for k > 50. Convergence 


occurred in four lterations. 
5.3 Two State-Variable Examples 


The system considered for the first two state-variable example can 


be described by the equations 


x ,(k*1) = x (k) + 0.01 х. (К); х. (1) ес, (5%?) 

х (k+1) = x(k) — 0.02 x (к) — 0.03 |x,(k)] x,(k) +0.01u(k) | х, (1) єс, (Em) 
y ,(k) = x, (k) (5. 9) 
y (k) = x,(k) (5220) 


A block diagram of this system is shown in figure 5.11. The system 





Figure 5. 11 - Two State-Variable Nonlincar System 


described above may be thought of as the discrete approximation for 


the continuous time system 


х(1) +3 1x(01x(0 + 2x(0) «и(о) (5.10 
y ,(t) = x(t) (5, Ша) 
у (1) = х(0) (56712) 


The performance index used was 


100 99 
l 2 2 1 
J < 2, > 19 109 12 00) -х (ЮГ +0 00 (700 -х (ЮГ 1 + 2, = Кк) (к) (5. 14) 


ЯА! 





The equations that form the basis of the iterative routine follow from 
equations (2.23), (2. 27), (0.31), аға (2.32) Tbe cau Mons forthe 


sub-optimal systems are the same except that 


и (К) =С, [2 , (К) -х, (К)] +С, [2 (к) - x ,(k)] (5205) 


where G, and G, are constant gain íactors. 
Figures 5. 12 through 5. 21 give the plotted results from ten data 


sets for this example. Comments on these figures follow. 


l igumre 5.12 - 5. 14: Fortheše data VOY RK) = 0.01, Q (k) = 1.00, 

ИК) = 1.00, х,(1) =0.0, 2(к)=0.0, amd z; (k) = 0.0. In figure 5. №, 
EU) - 1.0, in figure 5. 13, x (1) = 3.0, and in figure 5. 14, x (1) = 100. 
For each oí these convergence occurred in three or four iterations. The 
sub-optimal control with G, = 8.5 and С, =4.75 gave performance 
indices of 5.360, 31.32, and 158.62 for х (1) = 1.0, = 3.0, апа 

= 10.0, respectively. The sub-optimal control system performance 
indices were 17.5, 7.0, and 7.5 percent higher than the optimal 


performance indices. 


Bißures 5.15 - 5.17: For these figures, the data were the same as 
fom figures 5.12 - 5.14 ехсер that z (k) - 0.0 for k < 50 and 
š (k) = 1.0 for k > 50. In each case convergence occurred in three 


iterations. 


Figure 5.18: For this data set, R(k) = 0.01, Q (k) = ЛО, Q (k) UG, 
x (1) = 0.0, х,(1) = 1.0, 2 (К) = 0.0, ала z (k) = 0.0. Convergence 
occurred in five iterations. The sub-optimal control system with 

C, = 11.0 and er = 2.0 gave a performance index of 1. 305, 


about 5 per cent higher than the optimal system performance index. 
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Figure 5.19: The same data applies here a5 in the previous figure 
except that 2,(К) =0.0 for k < 50 and ¿,,(k) = 1.0 for k > 50. 


Convergence occurred in five iterations. 


Figure 5.20: For this data set, R(k) = 0.01, Q (k) = 10.0, Q,(k) = 1.0, 
x (1) = 8.0, x (1) = 1. 0, z (k) - 0.0, апа z (К) = 0.0. Convergence 
occurred in four iterations. The sub-optimal system with G, = 28.0 
and G, = 10.7 gave a performance index of 5.71, or less than 


one per cent higher than that for the optimal system. 


Figure 5.21: The same data applies here as in the previous figure 
except that z (xk) =0.0 for k < 50 and z,(k) =1.0 for k > 50, 


Convergence was achieved in three iterations. 


The system considered for the second two state-variable example 


can be described by the equations 


x (k+l) = x(k) + 0.012 ,(k)/(1 +(x (k)|); х,(1) с, (5. 16) 
x ,(k+1) 9 x ,(k) — 0.01 x (К) + 0.01 и (К); х,(1) =с, (5. №) 
у, (К) =х, (К) (5. 18) 
y (k) = x (k) (5. 19) 


This system can be thought of as the discrete approximation to the system 


described by the block diagram below 





The performance index for this example is the same as that for the 
previous example, and the equations used in the iterative procedure are 
the same except for the system equations. 

Figures 5.22 through 5. 25 give the plotted results from four data 


sets for this system. 
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Figures 5.22 - 5.24: For these data sets, R(k) 
О (К) = 1.0, x (1) = 0.0, z (k) = 0.0, and 2 (К) 


O. Ol Q (К) = КТО, 
0.0. Ш вв. 


5.23, and 5.24, x,(1) = 1.0, 3.0, and 10.0, respectively. The number 


of iterations required for convergence was three, two, and two. 


Figure 5.25: For this data set, R(k) - 0.01, Q (k) z 1.0, Q (k) = 0.0, 
x (1) SORO; x (1) sodas z (k) = 0.0, and 2 (К) = 0.0. ¿Hour ме ions 


were required for convergence. 


One additional variation of this problem was run in an effort to get 
some indication of under what conditions the iterative routine might not 
converge. For this purpose, the nonlinearity was made more violent by 


changing the system equations to 


x ,(k*1) 2 x ,(k) * x (k)/(1.0 «10.0 | x ,(k)]); x (I) =c, (5. 20) 
x ,(k+1) = x,(k) — 0.01 x (k) + 0.01 u(k); х,(1) «с, (а 
y (k) = x (k) (522) 
y ,(k) 2» x ,(k) (5723) 


For each of these data sets, КБЕ = 0. 01, О, = OG, oF = 00, 
and x (1) = 0.00. For the data set with x,(1) - 10.0, convergence 
occurred in five iterations. For the data set with x,(1) = 3.00, 
convergence occurred in four iterations. For the data set with x (1) = 1.00, 
convergence occurred after sume rather severe oscillations in the con- 
vergence criterion, and then only after 19 iterations. 

The convergence was slower when a small initial condition was used 
probably because in this case the system spent more time operating in 
the highly nonlinear regions. 

We can conclude from this variation of the example problem, that 
when thé nonlinearity is severe, the iterative routine may converge 
slowly or not at all. 

For purposes of comparing convergence rates, the value of the 


performance index, J, computed on each iteration has been included 
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in most of the preceding figures. The value of the performance index 


computed on the convergent iteration is denoted by y. 
5.4 Stochastic Examples 


The results of three stochastic examples are presented in this 
section. In each of these examples, the nonlinear system being 
controlled is disturbed by a random input. 


The computer algorithm that was used is outlined below. 


Step l. Using P(k) = 0, x (k) = 0, and r(k) = Q, the control and 
the state variables are extrapolated ahead to determine ч(1),..., 
u (99) and x (2), к-г, х (100). 


Step 2. Using the u(k) andthe x(k) just determined, PW .., 


P(11) and x(99),..., x(ll) are computed by backward recursion. 
Step 3. The control, u (1), 522 ч (10), and the state, x (2), M 
4411), are computed with r(1)..., r(10) taking on random values, 


simulating the actual evolution of the nonlinear system. 

Step4. Using P (k) and x (k) previously determined, and 
г (К) а 0, the control and the state variables are extrapolated ahead 
topd@eermin® u (ML), ..., u(99) aed mU ....990). 

Step 5. Using the x(k) and thé u(k) just det@rmmé@d, F(99),..., 


P(21) and x (99), іа х (21) аге computed. 


Steps 3, 4, and 5 are then repeated, starting at k=11, k= 2l, etc., 
until the actual simulation has evolved to k = 100. The system should 
be visualized with steps 4 and 5 simulating the controller in fast time, 
and step 3 simulating the actual evolution of the nonlinear system in 


real time. 


Figure 5.26: The results given in this figure are for the example using 
the system of section 5.2, but with an independent random disturbance, 
г (К), added. For this data set, К (К) = 0.01, Q(k) = 1.0, x(1) = 1.0, 


and ezi{k) = 0.0. Notige that by th& time Ко. і, Ше P and X variables 
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are well determined with no more jumps, indicating that despite the 
random disturbance, the control system is operating near optimally. In 
this example and the others of this section, r(k) is a zero mean, unit 


variance, independent random sequence. 


Figure 5.27: The results given in this figure awe for the example weine 

the same system as the first example in section 5.3, but with an independent 
random disturbance, r(k), added to the х, component. For this data set, 
R(k) = 0.01, Q (k) = 1.0, Q (k) =]. O; x (1) = 0..0, x (1) - 1.0, 2, (К) = 0. 0, 
and z,(k) = 0.0. 


Figure 5.28: The results given in this figure are for a nonlinear system 
disturbed by dependent noise. In this example, x(k) represents the 


dependent noise which is obtained from independent noise by the system 
x, (k+1) = 0.95 x (k) + 0.05r(k) х, (1) =0.0 (5. 24) 

where r(k) is an independent random variable. The state of the nonlinear 
system being controlled is represented by х (К), and is determined by 
the equation 

x ,(k+1) = x,(k) -0.05x3(k) +0.05u(k) + 0.05 x (К); х, (1) =1.0 (5. 25) 
Together x (k) and x (k) make up an augmented two-dimensional state 
vector. For this data set, R(k) = 0.01, Q (k) - 0.0 (as we have no 


control over the noise), Q (k) OE. z (k) = 0.0, and z,(k) = 0.0. 
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CHAPTER VI 
CONCLUSIONS 


The major contribution of this work has been the presentation of a 
theory along with an iterative algorithm for the solution of optimal 
nonlinear control problems subject to quadratic performance criteria. 

In addition, the results of the computer examples presented in Chapter V 
have demonstrated the feasibility of the method. 

A by-product of the theory has been the analytic solution of the 
problems of Chapter IV. In Chapters IV and V, comparisons of sub- 
optimal systems with the optimal ones determined by the theory have 
shown that often near-optimal performance is possible with simple 
linear controllers, a possibility that has been suspected but not 
demonstrated previously. 

All is not rosy, however. Appendix C shows that the method is 
essentially limited to problems of no more than five state variables 
and control intervals of no more than 1000 steps by the size and speed 
of presently available digital computers. 

Many questions have been raised, but not answered. Of prime importance 
among these is the question of under what conditions can the convergence 
of the iterative algorithm be guaranteed. Further research on the problem 
with stochastic disturbances is required in order to determine under 
what conditions the control procedure presented in section 2. 7 is 
reasonable. 

It would be highly desirable to be able to rephrase the problem in such 
a way that the optimal control system determined by the theory would be 
restricted to be non-anticipative. This problem has been worked on 
brielly by the author, but without results. 

Finally, although it is conceivable that actual control systems may be 
synthesized by this method, it is far more likely that the main use for the 
theory will be to establish ultimate performance figures for comparison 
purposes in design studies. Further research in this direction seems 


warranted. 
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APPENDIX A 
TWO MATRIX IDENTITIES 
Theorem Al: Ш К! апа [К + G'PG]" exist, then 


вс ê Û HE Bel" G (А. 1) 


Proof: The proof uses a method of matrix manipulations given by Cox.22 
This method views a matrix as a linear transformation and shows that 
such transformations obey all the rules for block diagram manipulation 
provided order of blocks is preserved. In other words, block diagram 
manipulations may be used to prove matrix identities. 

For the proof of this theorem, it is easy to show that the expression 
on the right-hand side of equation (A. 1) can be represented by the block 


diagram 





provided к! and [ R + G'PG/' exist. 


By moving G into the loop we get 
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Then moving R” апа С back out the other side of the loop gives 





зве" РІ" R-! Es (A. 2) 


Which proves the theorem, 
Theorem A2: І В" апа |В Є С'РСГ' ес then 


РСГ СР vnm 


Proof: The proof proceeds by using the definition of an inverse. 
Thus if the right-hand side of (A. 3) is truly the inverse of 1-4 GR'G'P 


then we must have 


[1+GR'G’P]11-G(R+G’°PG]" G’P| =1 (A. 4) 
ОГ 
¡MG - ç [R + G real cre- Crea- u Cee (A. 5) 


By regrouping terms we pet 


L+GIR*-[R+G'PGI” -RG PGIR+G PG IGP =1 (A. 6) 


But since [R + Gare)" exists, we can write 


176 @" (ke + 6 Pe! - 1 R uqg pG l в + В ESI СРЕ (А. 7) 


The bracketed term is the zero matrix, hence 


lel (ATE) 


proving that lI - G (R + ве C'P is indeed the inverse given in (A. 3). 
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APPENDIX B 
STABILITY 


In the design of any control system, the question of stability is of 
paramount importance. For this reason, the stability of control systems 
synthesized using the theory of Chapters II and III is considered here 
briefly. For simplicity we shall consider first the continuous time system 
and use the second method of Lyapunov. 

For the unperturbed control system (1.е., 2(1) + 0), the value 
function (3. 24) is positive definite, provided í = 0 and h=0 when 
x(t) =0 and u(t) = 0. In addition V (x(t), t) approaches infinity as x(t) 
approaches infinity. 

The derivative of V with respect to time along an optimal trajectory 
iS given by 


Vix(t), «Уү «Уг Хүр) (в. 1) 
or, by (3. 19), 


. l 2 l 2 
Và(0,0 »- Пао, 7 5 ПО (B. 2) 


The right-hand side of equation (B. 2) is non-positive definite. A function 
Which posses these properties is called a Lyapunov function, and the 
second method of Lyapunov states that when a Lyapunov function exists 
for a System, the system is stable. 

As a matter of fact, V (x (t), t) is usually negative definite, although 
it is difficult to give general conditions under which this is true. In this 
case the sécond method of Lyapunov guarantees that the system will be 
asymptotically stable. 

For the discrete time control system, analogous results can be 


drawn using a discrete version of the second method of Lyapunov. 
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APPENDIX C 


COMPUTATIONAL CONSIDERATIONS 


C. 1 Computer Storage Requirements 


The discrete time problem is analyzed in this section to determine 
computer storage requirements, and in the next section to determine com- 
puter time requirements. Because we are attempting to get approximate 
answers, many simplifying assumptions will be made. 

The first assumption we will make is that we are interested in 
computing the optimum control only. For instance we are not interested 
ins computing alk). By considering equations (2. 23), (2 27), (2, 31), 
ата (2.32), we can determint the computer storage теапттеттиенсе for 


the iterative algorithin of section 2.4. These requirements are given 


in Table I. 
Table 1 
Number of Registers 
Variables R а 
equired 
1 
P(k) -а(041 Ч 
- 2 
х (К) oN 
x (k) nN 
u(k) rN 
1 
Total КООШ м 


Assuming a single input system, that is, r= l, and for a computer 
with 50,000 registers, the dimension of n must be less than 5 and 
М = 1000 in ordét to fit the problem on the computer: For thd sanae 
computer with N = 100, the dimension of n must be less than 20. 
Even from this quick look into the storage requirements aspect oí the 
problem, we can immediately see that the method 1s poing to be severely 


restricted by the size of present day computers. 
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С.2 Time Requirements 


For computer time requirements, we will determine the total number 
of mathematical operations involved in one iteration of the algorithm. We 
will assume that all operations require the same amount of time. The 
total time required can then be determined by multiplying the total number 
of operations by the average time required per operation. In addition, to 
simplify matters more, we will assume that the input u(k) is a scalar 
Ш.®., r= 1) and enters in only One componentsor г. 

Table II was determined by examination of the same equations as 


were used in determining Table I. 


Table II 
Variables Number of Operations 
P (k) (ані) 7n 213m ?)N 
x (k) n (7n ?+3m ?)N 
x(k) 2n (n+m)N (estimated) 
u(k) (3n ?+2n)N 
Total |5 +20 (m+1)+ n (a43) 7а “за х 


As an example, suppose N = 1000 and n = m = 10. The total number 
of operations would be on the order of 6 x 10’. If the computer could 
process, on the average, one operation every ten microseconds, the total 
time required for one iteration would be about ten minutes. Again the 
limitations of this algorithim using present day computers becomes 
plainly evident. 

As a second example suppose М - 1000 but n = m = 5. Then the 
total number of operations required for one iteration would be on 
the order of 4.5 . 10°. Ata computer speed of one operation every 


ten microseconds, this would require about 45 seconds per iteration. 
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These figures are somewhat conservative because they neglect the 
time saving possible when repeated factors are encountered. Neverthe- 
less, the figures agree in order of magnitude with the times observed on 
actual computer problems. (The actual computer times are about one- 
half to two-thirds of that predicted. ) 

From these example problems, we can conclude that a problem with 
5 state variables and N = 1000 steps, represents about the largest 
size problem that can be handled by this algorithm with presently 


available computers. 
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APPENDIX D 


FORTRAN PROGRAMS FOR TWO STATE-VARIABLE EXAMPLES 
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